MATCH: An Architecture for Multimodal Dialogue Systems
نویسندگان
چکیده
Mobile interfaces need to allow the user and system to adapt their choice of communication modes according to user preferences, the task at hand, and the physical and social environment. We describe a multimodal application architecture which combines finite-state multimodal language processing, a speech-act based multimodal dialogue manager, dynamic multimodal output generation, and user-tailored text planning to enable rapid prototyping of multimodal interfaces with flexible input and adaptive output. Our testbed application MATCH (Multimodal Access To City Help) provides a mobile multimodal speech-pen interface to restaurant and subway information for New York City. 1 Multimodal Mobile Information Access In urban environments tourists and residents alike need access to a complex and constantly changing body of information regarding restaurants, theatre schedules, transportation topology and timetables. This information is most valuable if it can be delivered effectively while mobile, since places close and plans change. Mobile information access devices (PDAs, tablet PCs, next-generation phones) offer limited screen real estate and no keyboard or mouse, making complex graphical interfaces cumbersome. Multimodal interfaces can address this problem by enabling speech and pen input and output combining speech and graphics (See (André, 2002) for a detailed overview of previous work on multimodal input and output). Since mobile devices are used in different physical and social environments, for different tasks, by different users, they need to be both flexible in input and adaptive in output. Users need to be able to provide input in whichever mode or combination of modes is most appropriate, and system output should be dynamically tailored so that it is maximally effective given the situation and the user’s preferences. We present our testbed multimodal application MATCH (Multimodal Access To City Help) and the general purpose multimodal architecture underlying it, that: is designed for highly mobile applications; enables flexible multimodal input; and provides flexible user-tailored multimodal output. Figure 1: MATCH running on Fujitsu PDA Highly mobile MATCH is a working city guide and navigation system that currently enables mobile users to access restaurant and subway information for New York City (NYC). MATCH runs standalone on a Fujitsu pen computer (Figure 1), and can also run in client-server mode across a wireless network. Flexible multimodal input Users interact with a graphical interface displaying restaurant listings and a dynamic map showing locations and street information. They are free to provide input using speech, by drawing on the display with a stylus, or by using synchronous multimodal combinations of the two modes. For example, a user might ask to see cheap Computational Linguistics (ACL), Philadelphia, July 2002, pp. 376-383. Proceedings of the 40th Annual Meeting of the Association for
منابع مشابه
Evaluating Dialogue Strategies in Multimodal Dialogue Systems
Previous research suggests that multimodal dialogue systems providing both speech and pen input, and outputting a combination of spoken language and graphics, are more robust than unimodal systems based on speech or graphics alone (Andr ́e, 2002; Oviatt, 1999). Such systems are complex to build and signifi cant research and evaluation effort must typically be expended to generate well-tuned modu...
متن کاملHow was your day? An architecture for multimodal ECA systems
Multimodal conversational dialogue systems consisting of numerous software components create challenges for the underlying software architecture and development practices. Typically, such systems are built on separate, often preexisting components developed by different organizations and integrated in a highly iterative way. The traditional dialogue system pipeline is not flexible enough to add...
متن کاملMobile Architecture for Distributed Multimodal Dialogues
There is an increasing need for mobile spoken dialogue systems. Mobile devices, such as smartphones and personal digital assistants, can be used to implement efficient speech-based and multimodal interfaces. Currently, the development of such applications lacks suitable tools. We introduce general system architecture for mobile spoken dialogue systems. The architecture is a mobile version of th...
متن کاملThe SmartKom Architecture: A Framework for Multimodal Dialogue Systems
SmartKom provides an adaptive and reusable dialogue shell for multimodal interaction, which has been employed successfully to realize fully-fledged prototype systems for various application scenarios. Taking the perspective of system architects, we will give a review of the overall design and specific architecture framework being applied within SmartKom. The basic design principles underlying o...
متن کاملImplementing an Intelligent Multimedia Presentation Planner using an Agent Based Architecture
This paper describes the implementation of an intelligent Multimedia Presentation Planner (MPP) in a multimodal dialogue system. Following the development of an architecture based on the Standard Reference Model, designed specifically for FOCAL, this implementation has been integrated with an earlier spoken dialogue system. The design now ensures that the framework is portable to other multimod...
متن کامل